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We consider a class of semiparametric regression models which 
are one-parameter extensions of the Cox [J. Roy. Statist. Soc. Ser. B 
34 (1972) 187-220] model for right-censored univariate failure times. 
These models assume that the hazard given the covariates and a ran- 
dom frailty unique to each individual has the proportional hazards 
form multiplied by the frailty. The frailty is assumed to have mean 1 
within a known one-parameter family of distributions. Inference is 
based on a nonparametric likelihood. The behavior of the likelihood 
maximizer is studied under general conditions where the fitted model 
may be misspecified. The joint estimator of the regression and frailty 
parameters as well as the baseline hazard is shown to be uniformly 
consistent for the pseudo-value maximizing the asymptotic limit of 
the likelihood. Appropriately standardized, the estimator converges 
weakly to a Gaussian process. When the model is correctly specified, 
the procedure is semiparametric efficient, achieving the semiparamet- 
ric information bound for all parameter components. It is also proved 
that the bootstrap gives valid inferences for all parameters, even un- 
der misspecification. We demonstrate analytically the importance of 
the robust inference in several examples. In a randomized clinical 
trial, a valid test of the treatment effect is possible when other prog- 
nostic factors and the frailty distribution are both misspecified. Under 
certain conditions on the covariates, the ratios of the regression pa- 
rameters are still identifiable. The practical utility of the procedure 
is illustrated on a non-Hodgkin's lymphoma dataset. 
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1. Introduction. An objective of many medical studies is a predictive 
model for survival. The Cox (1972) model is popular for such analyses, be- 
cause of its theoretical properties and availability in software packages. Un- 
fortunately, in many practical settings the phenomenon under study is quite 
complicated and the assumed model is at best a working approximation 
to the truth. Consider the Non-Hodgkin's Lymphoma Prognostic Factors 
Project (1993) which analyzed data from a collection of cancer clinical trials. 
A system was developed to classify patients according to baseline character- 
istics. The scheme employs a proportional hazards model with five influen- 
tial covariates. The ordinal and continuous predictors are dichotomized for 
clinical interpretation. There are also important risk factors which are omit- 
ted, such as treatment center. Diagnostics show that the model fits poorly 
[Gray (2000)]. Furthermore, the survival estimates are quite biased by the 
misspecification. 

There are several alternatives to the Cox model which might improve the 
fit. These include additive hazards regression models [Aalen (1978, 1980) 
and Lin and Ying (1994)], accelerated failure time models [Tsiatis (1990) 
and Wei, Ying and Lin (1990)] and time-varying coefficient models [Sargent 
(1997)]. Additional models have been developed for covariate-dependent het- 
eroscedasticity and other departures from proportionality [Bagdonavicius 
and Nikulin (1999) and Hsieh (2001)]. 

Frailty models are a comparatively parsimonious representation which 
generalize the Cox model in a natural way. The misspecified and omitted 
covariates are described by an unobservable random variable log(VF) unique 
to the linear predictor of each patient. Let T be the failure time and Z = 
{Z(t),t > 0} a d x 1 vector process of possibly time-dependent covariates. 
Denote X{t; Z(t),W} as the hazard function of T conditionally on Z(t) = 
{Z(s),s < t} and W. The proportional hazards frailty regression model is 

(1.1) \{t; Z(t),W} = a(t) exp{log(H0 + (i'Z{t)}, 

where (3 is a d x 1 regression parameter, a(t) is an unspecified base hazard 
function and prime (') denotes transpose. Taking f(w;-y) to be the Lebesgue 
density of a continuous frailty W, where 7 is an unknown scalar, yields a 
rich class of semiparametric models. This class excludes models with posi- 
tive probability of W = 0. Examples in the class include the inverse Gaus- 
sian frailty [Hougaard (1984)], the positive stable frailty [Hougaard (1986)], 
the log-normal frailty [McGilchrist and Aisbett (1991)], the power variance 
frailty [Aalen (1988)], the uniform frailty [Lee and Klein (1988)] and the 
threshold frailty [Lindley and Singpurwalla (1986)]. While the one-parameter 
extension (1.1) of the Cox model is unlikely to address all misspecification, 
it is a point of departure. The objective of this paper is to provide a rigorous 
foundation for inference within this class of models, adopting the point of 
view that any model is at best a working approximation to the truth. 
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It is popular to let W have a gamma distribution with mean 1 and vari- 
ance 7. With time-independent covariates, the model is equivalent to the 
odds-rate regression [Dabrowska and Doksum (1988)], 



where h(t) is an unspecified strictly monotone increasing function, and 
exp(e 7 ) has a Pareto(7) distribution. Fixing 7 = gives proportional haz- 
ards, while 7 = 1 gives proportional odds [Bennett (1983)]. If 7(1057) has 
zero variance, then (1.1) reduces to the Cox model and efficient estima- 
tion of (3 is straightforward with the partial likelihood [Andersen and Gill 
(1982)]. Estimation for the special case (1.2) with 7 known has been studied 
extensively [Pettitt (1982, 1984), Cheng, Wei and Ying (1995, 1997), Mur- 
phy, Rossini and van der Vaart (1997), Fine, Ying and Wei (1998), Scharf- 
stein, Tsiatis and Gilbert (1998), Shen (1998) and Slud and Vonta (2004)]. 
When the parameter in the frailty distribution is unknown, these methods 
are not applicable. Asymptotic theory for maximum likelihood estimation 
of model (1.1) with clusters of size greater than or equal to 2 and shared 
gamma W having unknown 7 was derived by Parner (1998). See Nielsen, 
Gill, Andersen and S0rensen (1992) and Murphy (1994, 1995) for related 
work. A unified theory for estimation in model (1.1) with uncorrelated data 
and general frailty distribution is not available. 

In this paper the focus is on independent observations. The data setup 
and frailty model assumptions are given in Section 2. Bagdonavicius and 
Nikulin (1999) suggested ad hoc estimators for the parameters, but their 
large-sample properties were not established rigorously. The large-sample 
results in Parner (1998) can be adapted to the univariate gamma frailty 
setting with a correctly specified model, but do not apply to other frailty 
models and cannot be used to address model misspecification. 

In Section 3, a likelihood-based procedure for model (1.1) is formally pro- 
posed, and the existence of likelihood maximizers and of both score and in- 
formation operators is examined without requiring the model to be correctly 
specified. Section 4 establishes uniform consistency and weak convergence of 
the parameter estimators under mild identifiability conditions which ensure 
the uniqueness of the implied parameter corresponding to the maximizer 
of the asymptotic limit of the likelihood with respect to the true model. 
We also study properties of the estimators in settings where the model is 
not identifiable, as occurs when (3 = and the frailty variance and base- 
line hazard are confounded. To our knowledge, this is the first attempt at 
asymptotic theory for misspecified nonparametric maximum likelihood es- 
timation (NPMLE) for semiparametric survival regression models. White's 
(1982) work on robust parametric likelihood estimation is not directly appli- 
cable due to the presence of nonparametric components in (1.1). The closest 
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related work is on asymptotic theory for the misspecified Cox model based 
on partial likelihood [Struthers and Kalbfleisch (1986), Lin and Wei (1989) 
and Sasieni (1993)]. However, these results do not apply to estimation based 
on full nonparametric likelihood. 

Because the parametric and nonparametric components in (1.1) are esti- 
mated simultaneously, inference is complicated. Parner (1998) showed that 
the variance of the NPMLE for the gamma frailty model with cluster sizes 
greater than or equal to 2 can be consistently estimated by inverting a 
discrete observed information matrix. However, computing the required sec- 
ond derivatives can be difficult when the likelihood does not have a closed 
form, for example, with log-normal frailties. Furthermore, the limiting co- 
variance function is extremely complicated and does not permit the con- 
struction of analytic confidence bands for functionals of the baseline hazard 
such as covariate-specific survival functions. The procedure we employ is to 
maximize the profile likelihood using a simple fixed-point algorithm for the 
baseline hazard motivated by the EM algorithm. We show that bootstrap- 
ping this procedure provides valid inference, including variance estimation 
and the construction of confidence bands for survival functions under model 
misspecification. The estimated survival probabilities may not be unbiased 
in large samples. However, it may be useful to interpret these quantities as 
minimizing the Kullback-Leibler discrepancy between the survival curves 
under the fitted and true models, conditionally on covariates. 

In Section 5, the identifiability conditions given in Section 4 are shown to 
be satisfied when the model is correctly specified. We further verify that the 
estimators achieve the semiparametric variance bound [Sasieni (1992) and 
Bickel, Klaassen, Ritov and Wellner (1993), hereafter abbreviated BKRW] 
and are fully efficient for all model parameters. Section 6 evaluates the con- 
ditions of Section 4 under misspecification. In this case the estimators are 
still uniformly consistent and converge weakly, but inference must be based 
on an infinite-dimensional analogue to White's (1982) robust variance for- 
mula. Our contributions beyond Parner's (1998) work on the shared gamma 
frailty model are threefold. First, we study univariate data. Second, we allow 
general frailty distributions. Third, we permit misspecification. 

The robust inferences are practically useful under some well-known mis- 
specification mechanisms. To begin, we establish that when the true model 
has the form (1.1) but the choice of the distribution of W is incorrect, the 
parameter estimate for a single covariate which is independent of one or 
more other misspecified covariates may be consistent up to sign. Note that 
all the covariates may be partially misspecified under mild restrictions. The 
setting applies in particular when assessing treatment effect in a random- 
ized trial. Next we show that if the covariates Z are correctly specified and 
E[6' Z\(3' Q Z] is linear in (3qZ for all linear combinations b'Z, then the pa- 
rameter estimates are consistent for a2/3o> where (3q is the true regression 
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parameter and < a<i E M. When the Cox model is used but the true model 
has frailty variance greater than 0, the estimated effect will be aifio, where 
oil E (0, 1) and (3q is the true effect. The conditional linearity assumption has 
been used by Li and Duan (1989) to establish similar robustness results un- 
der link function violations for parametric regression and for the Cox model 
based on partial likelihood but without censoring. Our results are applica- 
ble under independent censoring and are based on the full likelihood so that 
joint estimation of 7 and A(t) = J a(s) ds as well as (3 is possible, which 
may be necessary for survival predictions. 

While the focus of this paper is on independent survival times, many of 
the results and methods of proof are potentially applicable to multivariate 
failure time data. Furthermore, the fact that a proportional hazards model 
can be changed to a nonproportional hazards model by simply adding a 
frailty underscores the need to be careful when interpreting marginal in- 
ferences based on multivariate shared frailty models involving covariates. 
Nonproportional marginal hazards may not imply correlation of the failure 
times [Hougaard (2000)]. Extending the univariate results of this paper to 
the multivariate setting is an important topic for future research. 

Several computational issues are discussed in Section 7, and the utility 
of the methods is illustrated on the lymphoma data in Section 8. All proofs 
are given in Section 9. 

2. The data setup and frailty models. 

2.1. Data assumptions. The data {Xj = (Vj, Si,Zi), i = 1, . . . ,n} consist 
of n i.i.d. realizations of X = (V,d,Z), where V = T A C, S = 1{T < C}, 
x Ay denotes the minimum of x and y, 1{B} is the indicator of B and C 
is the right censoring time. The analysis is restricted to an interval [0, r], 
where r < 00. The covariate Z = {Z(t),t E [0,r]} is assumed to be a caglad 
(left-continuous with right-hand limits) process with Z(t) E M. d , t E [0, r]. 
We make the following additional assumptions: 

(Al) P[C = 0] = 0, P[C > t\Z] = P[C = t\Z\ > almost surely, and cen- 
soring is independent of T given Z. 

(Al') Condition (Al) is strengthened to require that C and Z are indepen- 
dent. 

(A2) The true density of T given Z, fo(t\Z), exists and is bounded over 
t G [0,t] almost surely, and P[T > t\Z] > almost surely. 

(A3) The total variation of Z(-) on [0, r] is < tuq < 00 almost surely, and 
var[Z(0+)] is positive definite, where for a real function F with right- 
hand limits we define F(t+) = lim s it F(s). 
(A3') Condition (A3) is strengthened to require that Z = (Zi,Z2), where 
Z\ G K is time independent and Z\ and Z2 are stochastically inde- 
pendent. 
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(A3") Condition A3 is strengthened to require that Z is time independent 
and that E[b'Z\c'Z] is linear in c'Z for all 6, c G R d . 

Conditions (Al) and (A2) are somewhat standard for right-censored re- 
gression models, while condition (A3) is needed for both asymptotic nor- 
mality in Section 4.3 and for parameter identifiability when the model is 
correctly specified in Section 5.1. The condition on var[Z(0+)] is similar 
to Parner's (1998) condition 2(g). The more restrictive assumptions (Al'), 
(A3') and (A3") are only used in Sections 6.2 and 6.3 for establishing ro- 
bustness results under misspecification. An important example of when con- 
dition (A3') holds is when Z\ indicates treatment and treatment assignment 
has been randomized to ensure that Z2, corresponding to other prognostic 
factors, is independent of Z\. An important example of when condition (A3") 
holds is when Z is multivariate normal. 

2.2. Frailty model assumptions. The frailty models we consider in this 
paper posit that the hazard function has the form (1.1). After integrating 
over W, the corresponding survival function at time t given Z = z becomes 



where W is continuous and independent of Z, A(s) = j s a(u)du, A 7 (t) = 
f£° e~ wt f(w; 7) dw is the Laplace transform of W and 7 G 1 is an unknown 
parameter. At this point we are not assuming that the posited model agrees 
with the true conditional density Jq. The remainder of this section contains 
technical conditions on the frailty models, conditions (B), (C), (D1)-(D3), 
(El) and (E2), which some readers may wish to skip over on the first reading. 

We assume that the posited model consists of a family of frailty trans- 
forms {A 7 } and a collection of indices {ip = (7,/?, A)}, which satisfy: 

(B) (3 G Bq, where B$ C M d contains and is open, convex and bounded 
and where B denotes closure of a set B. 

(C) There exist a constant cq and a continuous, decreasing function £0 : [0, 00) 1— > 
(0,3/4), so that < cq < £o(0) < 3/4, lim^oo eo(t) = and, for each 
positive m, t < 00, there is an extension of A(.)(-) : [0,m) x [0,t] 1— ► [0, 1] 
having domain [— eo(t),m] x [0,t]. 

For the parametric component 9= (7,/?), define the parameter set = 
(— co,?7ii) x Bq for some positive mi < 00. In consequence of conditions 
(A3) and (B), let 1 < Kq < 00 be the maximum possible value of 1 and both 



S{t\z) = P[T>t\Z = z] 



(2.1) 
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over (3 G -Bo and t G [0, r] . Also let A be the collection of 
monotone increasing functions A : [0, r] [0, oo), with A(r) < oo, and define 
Aq to be the subset of A consisting of absolutely continuous functions with 
derivative satisfying < a(t) < oo for all t G [0, r]. For 7 G [— co,mi], define 
Am so that ^4( 7 ) = A when 7 > 0, and Am = {A G .4: A(r) < e _1 (7)/i ; Co} 
when 7 < 0. Also define A°, s = A when 7 > and A°^ = { A G A : A(r) < 

£q 1 (7)/i^o} when 7 < 0. We can now define the index sets ^ = {ip : (7,/?) G 
6, A G A h) } = {^:AeA,peB ,-fe [-e(KoA(r)),mi]} and ^ = : (7, P) G 

We need the following additional conditions on A 7 , where we define A 7 = 
dA^(t)/(dt), A 7 = dLy{t)/{dt), G 7 = - log A 7 , G 7 = dG^(t)/(dt), G 7 = dG y (t)/(dt), 
G 7 = dGy(t)/(dt), G 7 X) = dGy/i&y), G^ ] = dG^\t)/{dt) and G 7 1} = dG { j\t)/(dt): 



(Dl) For each positive i < 00, we have the following for all 7 G [— £0(^)1 m i] 
and u G [0,i]: A 7 (0+) = 1, A 7 (u) > 0, < -A y (u) < 00 and < h\(u) < 

00; dG 7 (u)/(du), dG { j\u)/{du), dG?(u)/(d~f) and d 2 G^\u)/(dj) 2 

exist and are bounded; G 7 (0+) = 1, G 7 1) (0+) = 0, G 7 (u) < and 

G 7 1} (0+) <0. 

(D2) There exists a ci : (0, mi] 1— > (0, 00] such that, for any sequence {7fc} G 

[0, mi] with 7^, — > 7 > 0, lim sup^^^ sup u>0 -u Cl ^ 7 ^A 7fc (it) < 00 and limsup^^ 

su Pu>0 A 7fc («)| < 00 • 

(D3) There exists a C2 : [0, mi] 1— > (0,oo], with 02(0) = 00, such that for all 

sequences t k ^oo and {7/J G [-e (t k ),mi] with j k -> 7 > 0, liminffc^oo mf ue[otfc ] i fe G 7fc (u) > 

P2(7)- 

(El) For all 7G [0,mi] and all ie [0,oo), G 7 (i) + iG 7 (i) >0 and 



G 7 (i) 
GL(t 



G 7 (t) 



<0. 



G 7 (t) lG 7 (t). 

(E2) lim 7i0 E[(W^ - l) 2 ]/7 = 1 and lim 7i0 E[|VK - l| 3 ]/7 = 0, where W is a 
random variable with Laplace transform A~. 

Conditions (D1)-(D3) are needed for uniform consistency and weak con- 
vergence of the estimators. Condition (Dl) is also used for identifiability 
when the model is correctly specified. Conditions (El) and (E2) are needed 
for identifiability under misspecification. 

Remark 1. Condition (C) ensures that 70 = is an interior point. Parts 
of (D1)-(D3) are conditions on the moments of W. For (Dl) this follows since 
G 7 (0+) =E[W] and G 7 (0+) = -vax[W]. Condition (D2) is satisfied if there 
exists a continuous function c\ : (0,mi] 1— ► (0, 00) such that E[VF _Cl ( 7 )] < oo 
for all 7 G (0, mi]. 
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2.3. Examples of frailty models. The following are instances of frailty 
transforms: 

1. The gamma frailty has A 7 (i) = (1 + jt)^ 1 ^ . 

2. The inverse Gaussian frailty [Hougaard (1984)] has A 7 (i) = exp{— 7 _1 [(1 + 
2 7 t) 1 /2_i] } . 

3. The log-normal frailty [McGilchrist and Aisbett (1991)] has 

A 7 (t)= / exp{-tet 1/2v -~ (/2 mv)dv. 



4. The positive stable frailty [Hougaard (1986)] has A 7 (t) = exp{— t 7 }. 

5. The IGG(a) family of frailty transforms has the form 



A 7 = exp 



1 — a 



1 + T ^-| 

1 — a 



07 

where a £ [0, 1) is assumed known. The IGG(O) family is obtained by 
taking the limit as a j 0. 

Remark 2. The IGG(a) family includes both the gamma frailty (a = 0) 
and the inverse Gaussian frailty (a = 1/2). The IG is for "inverse Gaussian" 
and the second G for "gamma." 

Remark 3. A 7 and the functionals of A 7 introduced above are defined 
at 7 = by continuity. In all the above frailties, excepting the positive stable 
frailty, Ao(i) = lim 7 ^o A 7 (t) = e _i , corresponding to the Cox model. 

The following states that most of the stated frailty conditions are valid 
for a number of standard frailty families: 

Proposition 1. Conditions (C), (D1)-(D3) and (E2) are satisfied by 
the gamma, inverse Gaussian, log-normal and IGG(a), for any fixed a £ 
[0,1), frailty distributions. 

Remark 4. Verification of these conditions for the log-normal is hard 
technically since A 7 does not have a closed form. 

Remark 5. Condition (El) is easily verified for the gamma, inverse 
Gaussian and IGG(a) frailties, and has been validated numerically for the 
log-normal frailty for 7 € [0,4.62], corresponding to a frailty variance of 100. 
We conjecture that (El) holds for the log-normal frailty for all < 7 < 00. 

Remark 6. For the positive stable frailty, conditions (Dl), (D3), (El) 
and (E2) are not satisfied but conditions (C) and (D2) are. For example, 
G 7 (0+) = 00 when 7 < 1. Note also for this frailty that, when Z is time inde- 
pendent, — log S(t\z) = e 1 ^* + A^{t), and the model is thus not identifiable. 
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3. Nonparametric maximum likelihood estimation. 

3.1. The estimator. The nonparametric log-likelihood has the form 
L„M = P n { [ T [logG 1 (H^(s))+(3 , Z(s) + loga(s)}dN(s) 



(3.1) 



G 7 (£r*(V))}, 



where N(t) = 1{V < t,6 = 1}, Y{t) = 1{V > t}, H^(t) = J^e^' z ^dA(s), 
a = dA/dt and P n is the expectation with respect to the empirical probability 
measure. As discussed by Murphy, Rossini and van der Vaart (1997), the 
maximum likelihood estimator for a does not exist, because any unrestricted 
maximizer of (3.1) puts mass only at observed failure times and is not a 
continuous hazard. 

Instead, we compute the maximizer by profiling over A. This yields esti- 
mators for 9 = (7,/?) and A, but not a. The profile likelihood is pL n (9) = 
su PAeA h) L n{i>) = L n (9,A e ), where A e = argmax^e^ L n (6,A). Consider 

one-dimensional submodels for A, 1 1— ► At(-) = Jq ^{1 + th(s)}dA(s), where 
(•) denotes an argument ranging over [0,r] and h : [0, r] h- >• M is a bounded 
function. When the 7 component of 9 is nonnegative, the upper limit of 
elements of Am is unconstrained. In this setting one may differentiate 
L n {(9,A t )} with respect to t, where h(s) = l{s < u} and u £ [0, r], and 
solve for A with t = 0, since A = At=o. Hence, Aq solves 

i, W ^-(p,[y W ^.(« l{ ^,v, } - f |g|) 



(3.2) x F n {dN(s)} 

{4°{s)y l F n {dN{ s )}, 



where $g = (9,Aq). 

Under model misspecification, it is possible that the best fit will occur for 
some 7 < 0. In this case, f(w;j) will usually not be a density, even though 
the quantity S(t\z) in (2.1) is a proper survival function provided A(t) is not 
too large. Specifically, all A E Am must satisfy A(t) < Eq 1 (j)/Kq. Under 
this constraint, one may differentiate L n {(9,At)} with respect to t, where 
h(s) = l{v < s < u} - [A(u) - A(v)]/A(t) and u,v G [0,r], take t = 0, let 
v | u and solve for A. This yields 

(3.3) A e {u) = f U {jio(s)+ Pn (^ e )r 1 F n {dN(s)}, 

Jo 
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where 



A(r) 

By considering one-dimensional submodels {A s } with h(s) = — 1, the fact 
that the derivative of L n (9,A s ) at s = is nonpositive implies that Pni^e) > 
0. Thus, for all 9 G 0, Aq has the form given in (3.3), with p n (ipe) > only 
when 7 < and A$(t) = Eq 1 (j)/Kq, and p n (ipe) = otherwise. 

The same maximizer occurs with A A in place of a in L n (^), where 
AA(s) = A(s) — A(s— ) and A(s— ) = \im t ^ s A(t). That is, one maximizes 
L n {ip) over all A with jumps at the observed failure times. We denote by 
-^n(V0 the log- likelihood expression with A A in place of a. The nonpar a- 
metric maximum likelihood estimator (NPMLE) is ip n = (9 n ,A^ ), where 

9 n = argmax eg Q pL n (9). Equivalently, i/) n = argmax^ e * L n ('0). 
We have the following existence result. 

Proposition 2. Under conditions (Al)-(A3), (B), (C) and (D1)-(D3), 
and provided maxi<j< n 5{ > 0, then for some 1 < M < oo and all 9 S G there 
exist maximizers Aq, and all such maximizers satisfy (3.3) and 1/M <A$< 
M. 

Remark 7. Proposition 2 implies the existence of an NPMLE ip n as a 
consequence of the compactness of 0. However, the proposition says nothing 
about uniqueness of the NPMLE. 

For a limiting value of the NPMLE to exist, it is necessary (but not 
sufficient) that the M in Proposition 2 does not go to oo as n — > oo. How- 
ever, a significantly stronger result can be obtained. Define Km = {^4 € A: 
1/M < A(t) < M,sup te[0jT ] a(t) < M} and, for each e > 0, define K £ M = 

{A £ A : sup ig [ 0iT ] \A(t) — A(t)\ < e for some A G Km}- Note that Km is com- 
pact for each 1 < M < oo. Let P* denote inner probability. We have the 
following result. 

Theorem 1. Assume conditions (A1)-(A3), (B), (C) and (D1)-(D3). 
Then, for each r/ > 0, there exist some 1 < M < oo such that lim e jo-P*({^4e : 
9 G 0} G /C|^ Vn Zar^e enough) > (1 — 77). 

Remark 8. Theorem 1 implies that all sequences of NPMLE's have 
convergent subsequences and that the resulting limit points for A„ have 
bounded derivatives almost surely. Consistency will then follow from iden- 
tifiability of the model. Moreover, when only some of the parameters are 
identifiable, consistency of the identifiable parameters will also follow. The 
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important example of estimation of (3 when the survival distribution does 
not depend on covariates is discussed in Section 4.2. 

3.2. Kullback-Leibler information. We now establish properties of the 
Kullback-Leibler information. Let p^(v , e\z) = f e (v\z)S 1 ~ e (v \z), where /(i|2:) = —dS(t\z)/(dt), 
S(t\z) =exp{-G 7 (^(t))} as defined in (2.1). For each 9 G 9, let Ag = 
argmax4 g ^ h) n^ Polog(p^) and tpg = (9,Ag). We have the following result. 

Lemma 1. Under conditions (A1)-(A3), (B), (C) and (D1)-(D3), and 
for some 1 < M < oo, Ag G ICm for all 9 G 0. 

Remark 9. Lemma 1 tells us that, even without any model identifiabil- 
ity, all possible Kullback-Leibler maximizers lie in a compact set. Questions 
about consistency can thus be reduced to questions about identifiability or 
partial identifiability as mentioned in Remark 8. 

For each 9 G B, let ijjg = (9, Ag) and ijjg = (9, Ag). For any i\)\^2 in the sub- 
set of ^ where A\ and Ai have jumps only at observed failure times, define 
the empirical Kullback-Leibler information InO^i,^) = L n (ip\) — L n (ijj2), 
and, for any ipi,if)2 m the subset of for which the derivatives a\ and ai 
exist, define the Kullback-Leibler information Ioitpi,^) = -FblogCftyiAty^)- 
The following theorem establishes, in the profile context, an important asymp- 
totic equivalence between I n and Iq. 

Theorem 2. Under conditions (A1)-(A3), (B), (C) and (D1)-(D3), 

sup J / n (z^,^) - Io(ipei,ipe 3 )\ 
6»i,e 2 ee 

outer almost surely, as n — > 00 . 

Remark 10. While Proposition 2 and Lemma 1 establish the existence 
of the profile maximizers Ag and Ag, uniqueness is not established. However, 
Theorem 2 tells us that all members of the equivalence class Ag are asymp- 
totically equivalent to all members of the equivalence class Ag in terms of 
Kullback-Leibler information. Thus, model identifiability immediately im- 
plies asymptotic uniqueness. 

3.3. Score and information operators. In this section we derive the score 
and information operators. These play a key role in the weak convergence 
results presented in later sections. For each tp G with A having bounded 
derivative, define the one-dimensional submodels t\-^ipt = V' + ^^-ii Jo'** hs(s) dA(s)}, 
where (h±, li2, ^3) G H r for some r < 00 and where H r is the space of elements 
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h = (h\, ti2, hs) such that hi 6l, /12 € /13 is a cadlag (right-continuous 
with left-hand limits) function and \hi\ + V h^h~2 + ||^3||« < r > with || • ||„ be- 
ing the total variation norm. Let = \J 0<r<oo H r . Since ip can be repre- 
sented as a functional on H r of the form ij)(h) = /117 + h' 2 (3 + Jq h 3 (s) dA(s), 
the parameter space * is then a subset of £°°(H r ) with norm — 
su P/ieH r 1^(^)1; where £°°(B) is the space of bounded functionals on B. For 
ip £ \]/ and g,h£ H r , define 



ip 9 (h) = gihi + g' 2 h 2 + / g3 (s)/i 3 (s)dA(s). 

JO 



Note that ifi is rich enough to extract all components of tp since Hi includes 
{h : hi = 1, = ^3 = 0} U {h : h 2 = I, hi = h 3 = 0} U {h : hi = h 2 = 0, h 3 (s) = 
l{s <t},te [0,r]}. Let 



i=0 



(3.4) 



\L G 7 (fl*(7AT)) 7 



+ / [Z'(s)h 2 + h 3 (s)}dN(s) 
Jo 



(V A t)) 



+ 



C 7 (^(FAr)) 
G 7 (fl^VAT)) 7 



x ^ y(s)e' 3 ' z ( s )[Z / ( S )/ l2 + fc 3 (a)]dA(a) j 
= F n U T (ij)(h). 

This score operator can easily be extended so that the bounded deriva- 
tive restriction on A is unnecessary. The operator has expectation Uq(iP) = 
PoU T (ip). The dependence on r will be needed later. 

The Gateaux derivative of {7 r (V ; )(/i) at tpi £ ^ exists and is obtained 
by differentiating the score operator for the submodels 1 1— > ipi + This 
derivative is 



d_ 

dt 



U^(^i + t^)(h) 



t=o 



^(^i(h)), 
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where the operator a$ : H^, \— > is 

/ 11 12 13 



(3.5) 



21 22 23 
> 31 -32 33 

V la" *** **™ 

I a f a 4< °f 

-21 -22 -23 
a iP a i/> a ib 
. £.31 -32 ^33 

\ a tp a i> a i> 




Po 

P a^(h) 




The operators crjf = Po&jp, for 1 < j, k < 3, are well defined and bounded, where 



a, 



31 / 



and where 



(2) 



Z{s)Y{s)e f3 ' zis) dA(s)h 1 



^{h 2 )=£f I ' Z>(s)h 2 Y{ s y z ^dA{t 



lhh 3 )=i% ] f T h 3 (s)Y(s)e^ z ^dA(s), 



af(h 2 ) =|f | Z( S )Z / ( S )/ l2 y( S )e^' z ^^( S ) 

+ If ^ Z( S )y( S )e /3 ' z W dA(a) ^ Z^JM^aJe^W dA(«), 
*f(h 3 ) = ^ Z( fl )ft3( S )y( a )e^^) <L4( a ) 

+ |J 3) jT Z( S )y( S )e /3 ' z ( s ) dA(s) jT h 3 (s)Y( s y z ^ dA(s), 
of{h2){t)=ifz>(t)h 2 Y(ty' z ^ 

+ $?Y(t)e? z ® [ T Z'(s)h 2 Y( s y z ^ dA(s), 



tf(h3)(t) = £®h 3 (t)Y(t)eP' z V 

+ ifY(t)eP' z U [ T h 3 (s)Y(s)e^ dA(s) 



4 0) = d,(H*(y At)) - 5 



G 7 (^(T/Ar))' 
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G' 2) (^(VAr)) 



[ G 7 (^(VAt)) 
G«(^(FAr)) 



{ 



Gj {H^ (V At)) I 
G 7 (^(KAr)) J 




[ G 7 (fl^(VAr)) 



ffi A r))G 7 (g^(y A t)) 

{G 7 (^(VAr))}2 



and 



ei 3) =G 7 (^(^Ar))-5 



G 7 (^(7At)) 



G 7 (jj-^(yAr)) | 2 ' 
G 7 (^(VAr))J 



where we also define G~? = dGj (t)/(dj). 

To use the Z-estimator master theorem to obtain weak convergence in 
Section 4.3, the Gateaux differentiability of U T needs to be strengthened to 
Frechet differentiability. Accordingly, we have the following result. 

Lemma 2. Under conditions (A1)-(A3), (B), (C) and (D1)-(D3), and 
for any ipi S the operator tp i— ► U T (tp) is Frechet differentiable at ip\, with 
derivative tp^a^^h)). 

4. General results. 

4.1. Additional assumptions. Let So(t\z) = f?°fo(s\z)ds, where /q is as 

defined in condition (A2). Denote po(v,e\z) = fQ(v\z)S^ e \v\z) and let 
v(v,e,z) be the implicitly defined measure for (V,5,Z) such that the true 
expectation of g(X), denoted Pog, can be written as J^gPodv, where X is 
the sample space for X and g is measurable. Recall that the operator was 
defined in (3.5) for all ip £^f. We make the following assumptions about the 
relationship between the posited frailty model and the true distribution: 

(F) Polog(p^,/po) has a unique maximum over tp G at V>* = (7*,/?*, A*) G 

(G) o~^ t : Hoc i— > Hoc is one-to-one. 

Remark 11. Assumption (F) is analogous to assumption A3(b) of White 
(1982) and is required for consistency, while condition (G) is analogous to 
assumption A6(b) of White (1982) and is required for asymptotic normal- 
ity. The lack of convexity of the Kullback-Leibler information in the posited 
frailty models generally prevents assumptions (F) and (G) from being di- 
rect consequences of the other conditions, except when the frailty model is 
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correctly specified (Section 5) or when the true model is not too far from a 
member of the posited frailty model (Section 6.1). 

Remark 12. With a misspecified frailty model, the existence of the 
implicitly defined ip* does not guarantee its meaningfulness. In general, 
Pi/}* Po an d tp* = ipo only when p^ = pq. We show in Sections 6.2 and 6.3 
that when p^ is misspecified but assumption (F) holds, some of the compo- 
nents of tp* may sometimes be useful for inference about po. 

4.2. Consistency. The theorem we now present establishes the consis- 
tency of ip n under the identifiability assumed in (F). 

Theorem 3. Under the conditions of Proposition 2 and condition (F) ; 
ip n converges outer almost surely to tp* in the uniform norm. 

The following gives us the consistency of (3 n under an important partial 
identifiability setting not requiring condition (F). 

Proposition 3. Assume the conditions of Proposition 2 and that fo(t\Z = 
Zi) = fo(t\Z = Z2) for all t G [0,r] and all possible values z\ and Z2 of the 
covariate process. Then (3 n converges outer almost surely to 0. 

Remark 13. An innovation in the proofs of Theorem 3 and Proposi- 
tion 3 is that the existence and asymptotic boundedness of A n is established 
even when the model is misspecified or when condition (F) may not hold. 
This was shown in Proposition 2 and Theorem 1, where it was demonstrated 
that the asymptotic boundedness and equicontinuity of A n depends only on 
the structure of the data, the posited model and the general condition (A2), 
but does not depend on any other aspects of the underlying true distribution. 

4.3. Asymptotic normality. We use Hoffmann-j0rgensen weak conver- 
gence as described in van der Vaart and Wellner (1996) (hereafter abbrevi- 
ated VW). We have the following result. 

Theorem 4. Under the conditions of Theorem 3 and condition (G), 
y/nUPn — ip*) is asymptotically linear, with influence function 1(h) = [7 r (?/>*) x 
(cr^ (h)), h G Hi, converging weakly in the uniform norm to a tight, mean- 
zero Gaussian process with covariance V*(g,h) = E[£(g)£(h)], g, /iG-ffi. 

Remark 14. In the proof in Section 9, the problem of establishing 
weak convergence can be cleanly divided into establishing properties of 
the data and fitted model (1.1), based on conditions (A1)-(A3), (B), (C) 
and (D1)-(D3), and establishing properties of the Kullback-Leibler discrep- 
ancy Polog(pip/po), based on conditions (F) and (G), which involves the true 
distribution of the censoring and covariates. 
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4.4. The bootstrap. The usual nonparametric bootstrap resamples with 
replacement from the observed data. A disadvantage is that ties can arise 
with censored survival data. We propose an alternative weighted bootstrap. 
In each bootstrap sample one generates n independent and identically dis- 
tributed nonnegative weights Cl : • • • > Cn> with mean and variance 1 and with 
Jo°° \/P[Ci > x \ dx < oo. Each weight is divided by the average weight (reject- 
ing samples with all O's) to obtain "standardized weights" ->Cn which 
sum to n. Distributions satisfying the moment conditions include the unit 
exponential and the Poisson with mean 1. For the nonparametric bootstrap 
the weights Ci > • • • > d are generated from a multinomial distribution with 
= 1, i = 1, • • ■ , n, and YJLi Q = n - 

For a known function /, let F°J(V, 5, Z; 4>) = n^ 1 £f =1 Qf(V h Si, i/>) 
define the weighted empirical measure P° . The weighted bootstrap estimate 
-0° is computed by substituting P° for P n in the expressions in Section 2.2 
and maximizing over ip. Note that P^ is defined similarly to P° with the 
weights Ci , • • ■ > C n m place of Ci ? • • • > C£- The nonparametric bootstrap esti- 
mate i\)' n is computed by using F' n in place of P n in Section 2.2. 

The following result establishes the validity of both the nonparametric 
and the weighted bootstraps. 

Corollary 1. Under the assumptions of Theorem 4, the conditional 
bootstrap of iip n , based either on tp' n or is asymptotically consistent for 
the limiting process Z*. That is, y/n(ifj' n — ip n ) and y/n{ip^ — ip n ) are asymp- 
totically measurable, 

(i) sup g£BLl \E.g{y/n{ip' n — ip n )) — Eg(Z*)\ —> in outer probability, and 

(ii) sup g&BLl \E g(y/n(i>° - i> n )) - Eg{Z*)\ in outer probability, 

where BL\ is the space of functions mapping M. d+l x £°°([0,r]) ^ K with 
Lipschitz norm < 1, and conditional on the data E. and E a are expectations 
over the multinomial and standardized weights, respectively. 

Remark 15. While the choice of {Ci} in the weighted bootstrap has no 
effect asymptotically, the rate of convergence may be affected. Newton and 
Raftery (1994) discuss different choices in the context of parametric max- 
imum likelihood. They demonstrate that unit exponential weights, which 
are Dirichlet after standardizing, perform well. Our own experience is that 
exponential weights also work well for semiparametric inference. A detailed 
analysis of the distribution of the weights is beyond the scope of this paper. 

Remark 16. An advantage of using Z-estimator theory for establish- 
ing weak convergence of estimators for likelihood inference under possible 
model misspecification is that consistency of the bootstrap is essentially an 
immediate consequence of the influence function being Po-Donsker. 
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5. Results under correctly specified model. The focus of this section is 
on the behavior of ip n when the frailty regression model is correctly specified. 
Accordingly, ip* = ipo throughout this section. In Section 5.1 we establish 
that the identifiability condition [condition (F)] holds. In Section 5.2 the 
injectiveness of a^ [condition (G)] is established and shown to imply that 
ip n is both regular and efficient. In addition to assuming that A 7 is correctly 
specified, we make the following assumption: 

(H) 70 G [0,mi), / 0o G -Bo and A Q G A with a > 0. 

Remark 17. Condition (H) is also assumed by Parner (1998). When 
/3q = 0, the survival function So(t) does not depend on covariates, and we 
have the situation considered in Proposition 3. Moreover, we have that for 
any 7 > 0, there exists an Ay G Ao so that So(t) = A 7 (A 7 (t)). Thus, 7 and A 
are not identifiable when (3q = 0. 

5.1. Identifiability. Nonparametric identifiability of the mixed propor- 
tional hazards model, under the assumption that (5' Z takes on at least two 
distinct values, has been established for right-, left- and double-censored 
data with finite- mean frailties by Kortram, van Rooij, Lenstra and Ridder 
(1995). For earlier related work, see also Heckman and Taber (1994), El- 
bers and Ridder (1982) and Heckman and Singer (1984). In our case, A 7 
is parametric rather than completely unspecified and may not have the in- 
terpretation of being the Laplace transform of a frailty when 7 < 0. The 
following proposition establishes uniqueness of the model. 

Proposition 4. Under conditions (A1)-(A3), (B), (C), (D1)-(D3) and (H), 
model (2.1) is identifiable over ^q, and thus condition (F) is satisfied. 

Remark 18. The monotonicity in 7 of G 7 (0+), where G 7 (i) = -<9 2 log Aj(t) / (dt) 2 , 
as given in condition (Dl), is the key to establishing identifiability of the ex- 
tended Laplace transform. Since — G 7 (0+) is the variance of W when 7 > 0, 
this is the same as requiring that var[VF] be a monotone function of 7. 
The positive stable frailty model violates this condition and is not iden- 
tifiable without clustered data as noted in Remark 6 above. Because of 
Proposition 1, the gamma, inverse Gaussian, log-normal and IGG(a) frail- 
ties are identifiable. 

5.2. Efficiency. The main result of this section is as follows. 

Theorem 5. The information operator oy, is one-to-one. Thus, a^ is 
continuously invertible, condition (G) is satisfied and ip n is a regular and 
efficient estimator of ipo when 70 > and censoring is uninformative of tp. 
The limiting covariance for y/n(\p n — ipo) is ^oi&^L 1 (h)) , g,h£ H±. 
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Remark 19. For the shared gamma frailty regression model, Parner 
(1998) suggests inference based on estimating the covariance. This is done 
through first estimating a^ by plugging in tp n for tpQ and then inverting, 
considering only the parameters 7, (3 and AA at observed failure times. Be- 
cause this approach may be difficult to implement with general f{w\ r )) and 
does not readily enable the construction of confidence bands, the bootstrap 
is recommended for inference. By Corollary 1, Theorem 5 implies that the 
bootstrap will yield valid inferences. 

Remark 20. The proof of Theorem 5 draws heavily on the tangent 
set H r , as defined in the proof of Theorem 4. The issue is showing that, 
for h € H r , cr^ (h) = implies h = 0. This gives that a^ is continuously 
invertible and onto, and thus the influence function is contained in the closed 
linear span of the score operator, yielding the given covariance. Regularity 
and efficiency then follow from Theorems 5.2.3 and 5.2.1 of BKRW. 

Remark 21. An alternative estimator to ip n is to take 

$ n = ii>n,„ _ if7n>0, 

I (0,/? n > A n ), otherwise, 

where (3 n ,A n are the estimates based on the Cox model. It is not difficult 
to show that, when 70 > 0, tp n has the same limiting behavior as ip n , but 
when 70 = 0, the limiting distribution of \/n(ip n — tpo) is a mixture of the 
limiting distribution of y/n{ij) n — ipo) and the limiting distribution under the 
Cox model (with a in the 7 component), each with probability 1/2. This 
alternative estimator is thus more precise when 70 = 0. It also follows without 
difficulty that the conditional limit law of the bootstrap which imitates this 
estimation procedure is equal to the limit law of \/n(ip n — ipo) in the sense 
of Corollary 1. 

6. Results under model misspecification. In this section, we examine 
conditions under which the model is misspecified but the parameter esti- 
mates are consistent and asymptotically Gaussian, and some of the compo- 
nents of the estimated quantity may be interpreted via po- In Section 6.1 
we demonstrate that if the posited conditional survival distribution — based 
on the chosen frailty transform A 7 — is not too badly misspecified, then con- 
ditions (F) and (G) are satisfied under certain restrictions on the index 
set ^. In Section 6.2, we examine the effect of testing for the effect of a sin- 
gle covariate with misspecification. In Section 6.3 we study 7 and (3 under 
misspecification with structural requirements on the covariates. 
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6.1. Existence of unique Kullback-Leibler maximizer s. Define 

*M = {^= (7, P, A) : 7 E (-eo(^A(r)),mi), 

/? G i?o, A G ^ 7) , 1/M < a < M}, 

where 1 < M < 00, so(-) is as denned in condition (C) and Kq is as denned 
in Section 2.1. Let -D(^) be the space of all conditional densities k{v\z) such 
that k e (v\z)L(v\z) 1 ~ e , where L^z) = J^ 00 A;(n|z) du, is z/-measurable. Denote 
= p^(v,l\z). Also, for feD(v), let e|z) = / e (u|z)5 1_e (t;|z), 

where 5"(f|z) is the survival function corresponding to f(v\z). The main 
result of this section is as follows. 

Theorem 6. Assume that conditions (Al)-(A3), (B), (C), (D1)-(D3) 
and (H) are satisfied by the data and the posited frailty distribution. Then 
for every Q < 00 there exists an e > such that for each conditional density 
f E D{y), with f <Q v-almost surely and f x \f — f^\dv < e for some ip G 
^> M , there exists a unique Kullback-Leibler maximizer tprn = argmax^g^ Pr^ x 
log(p^»/p(/)) G $m, with Oip U) = P^a^ (f) : \-* being one-to-one, where 
p (f)9 = fx 9P(f) du. 

Remark 22. This theorem tells us that for any given class of propor- 
tional hazards frailty regression models parameterized by ^> m and satisfying 
the stated regularity conditions, there exist an infinite number of true models 
not agreeing with the posited model but which satisfy conditions (F) and (G) 
with '0* = (7* ,/?*, A* ) = tpm- In other words, conditions (F) and (G) are 
satisfied when the posited frailty family is sufficiently close to the true dis- 
tribution. This is important in misspecified frailty model settings where 
uniqueness is not guaranteed by convexity. Note that without the condition 
bounding the true densities by Q, the Kullback-Leibler discrepancy between 
the true and posited models may be unbounded even when the respective 
densities are quite close in Li(is). 

Remark 23. It is worth emphasizing that 7* may be less than 0. In 
practice, one might mistakenly assume that the model is correctly specified 
and constrain the maximization to be over the subset of V&m for which 7 > 0. 
Denote the resulting maximizer V* = (%,$*, A*) and assume it is unique. 
The results in Section 2 can then be redone for the estimator ip n defined in 
Remark 21. In the general setting, tfj n will be uniformly consistent for ip*, 
and y/n{ip n — V>*) will have three possible limiting distributions: when 7* > 0, 
the limiting distribution is a Gaussian process as given in Theorem 4; when 
7* = and the 7 term in Uq (?/>*) = 0, the limiting distribution is a mixture 
of two Gaussian processes similar to the mixture described in Remark 21; 
and when 7* = but the 7 term in Uq(i/j*) < 0, the limiting distribution for 
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y/n(j n — 7* ) is a point mass at while the remaining components have the 
limiting distribution resulting from assuming the Cox model (7* = 0) . It is 
not possible, under the stated regularity conditions, to have % = but the 
7 term in Uq(tP*) > 0, since this would imply that 7* > 0. It also can be 
shown that the conditional limit law of the bootstrap, which imitates the 
foregoing estimation procedure, is equal to the limit law of y/n(ip n — tp*) in 
the sense of Corollary 1. Hence, when the bootstrap distribution of 7* under 
this constraint is frozen at 0, there is significant evidence against the frailty 
model being correctly specified. 

6.2. Identifying an independent covariate effect under a misspecified model. 
In this section, we examine the effect of testing for a univariate covariate ef- 
fect in the presence of other covariates and frailties when the posited frailty 
distribution and some of the covariates may be misspecified. We assume 
throughout this section that the data satisfy conditions (Al'), (A2) and 
(A3') for the covariate process Z = (Z\,Z<i) and that the posited model satis- 
fies conditions (B), (C), (Dl), (D2) and (El). We allow Z^ to be misspecified 
and Z\ to be partly misspecified, in that we only assume Po{T < (-)\Zi = z%} 
is monotone in z\. 

The following is the main result of this section. 

Theorem 7. Assume conditions (Al'), (A2), (A3'), (B), (C), (Dl)- 
(D3) and (El) hold for the data and posited frailty model. Denote F^[€) = 
P[T > t\Z\ = z±] and assume that Fq 1 ^-) is monotone in z\ almost surely. 
The covariates may be otherwise misspecified. Also assume condition (F) 
holds with Kullback-Leibler maximizer if)* = (7*,/?* = (/9*i,/3*2)i A*) 6 
where 7* > 0. Then (i) if Fq 1 is constant in z\, = 0; (ii) if Fq 1 is strictly 
increasing in z\, > 0; and (iii) if F^ 1 is strictly decreasing in z\, /3*i < 0. 

Remark 24. If we interpret the covariate effect of Z\ to be positive 
when Fq 1 is increasing in z\, negative when decreasing, when constant and 
ambiguous otherwise, then Theorem 7 implies that the covariate effect can 
be consistently estimated up to the correct sign even if both Z\ and Z^ are 
otherwise misspecified. If condition (G) also holds, then the score and Wald 
tests for Hq : (3*\ = will be valid for testing the covariate effect of Z\. These 
results generalize Kong and Slud (1997) to more general misspecification 
when fitting the more general class of models (1.1). 

Remark 25. Note that we require 7* > 0. This is because condition (El) 
appears to be needed for Theorem 7, and this condition only works when 
7 > 0. This requirement is stronger than necessary for the consistency and 
asymptotic normality results for possibly misspecified models given in Sec- 
tion 4. 
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6.3. Coefficient effects under misspecified models. In this section we ex- 
amine the effect of regression parameter estimates under a misspecified 
frailty distribution and stronger conditions on the covariates. We assume 
that the data and posited frailty distribution satisfy (Al'), (A2), (A3"), 
(B), (C), (D1)-(D3) and (El). 

Remark 26. Conditional linearity [condition (A3")] is also used by Li 
and Duan (1989) in their study of regression analysis under link violation. 
Their results apply to fitting parametric models based on maximum likeli- 
hood and to semiparametric Cox models based on partial likelihood without 
censoring. Brillinger (1983) used this assumption to study unobserved Gaus- 
sian regressor variables in generalized linear models. In contrast, our results 
apply to semiparametric frailty regression models under frailty misspecifi- 
cation using nonparametric maximum likelihood with or without censoring. 
While condition (A3") is sufficient, it may not be necessary for the results be- 
low. 

We have the following proposition which extends Li and Duan (1989) to 
censoring when the true model has the form (1.1). 

Proposition 5. Assume conditions (Al'), (A2) and (A3") hold for 
the data, the posited model is a Cox proportional hazards model with (3 G 
M. d , and the true failure time distribution satisfies a proportional hazards 
frailty regression model with parameter tpQ = (7o,/3o, A)) G ^o> where 70 > 
and where the corresponding true negative log frailty transform family ( de- 
noted {G°}) satisfies conditions (C), (D1)-(D3) and (El). Then condi- 
tions (F) and (G) are satisfied with /3* = axPo, where ax = 1 if jo = and 
ax G (0,1) if jo >0. 

The following result establishes consistency up to scale when fitting (1.1) 
with a misspecified Laplace transform. 

Proposition 6. Assume conditions (Al'), (A2) and (A3") hold for 
the data; the posited and true proportional hazards frailty regression models 
satisfy conditions (B), (C), (D1)-(D3) and (El) for the common index set ^, 
where the posited negative log frailty transform family is denoted {G 7 } and 
the true family is denoted {G°}; and the true parameter value for the true 
model is ifto = (70, flo, Aq) G ^0, where 70 > 0. Also assume condition (F) 
holds with Kullback-Leibler maximizer ift* = (7*,/?*, A*) G ^q, where 7* > 0. 
Then (3* = a^Ab where 02 = ax when 7* = 0, 02 > when 7* > and ax is 
as defined in Proposition 5. 
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Remark 27. If conditional linearity is violated, it may be possible to 
perform a reweighted maximum likelihood estimation procedure, based on 
weights described in Cook and Nachtsheim (1994). The resulting estimator 
would have the properties described in Proposition 6 and could be employed 
as a diagnostic for (A3") by comparing to the unweighted estimator. 

Remark 28. Under the stated regularity conditions, these results show 
that using the Cox model when an unobserved frailty is present results in an 
estimate which is an attenuation of the true effect. When fitting (1.1) with 
the Laplace transform correctly specified, there is a deattenuation relative to 
the Cox model as a consequence of model identifiability. With a misspecified 
frailty distribution, the correct direction is obtained. However, it is unclear 
whether the effect size is deattenuated relative to the Cox model. 

Remark 29. One can test whether the Cox model is an attenuation of 
the true effect, that is, oti < 1, if the score test for Hq : 70 = remains valid 
and consistent under misspecification of the frailty distribution. Proving this 
in generality appears to be quite difficult, but the following result is a step 
in the right direction. 

Proposition 7. Assume conditions (Al'), (A2) and (A3") hold for the 
data and the posited and true proportional hazards frailty regression models 
satisfy conditions (B), (C), (D1)-(D3), (El) and (E2) for the common index 
set Then the score test for Hq : 70 = based on the posited model is valid 
and consistent under positive contiguous alternatives. 

Remark 30. Proposition 7 points out that this score test has the same 
form at 7 = under both correctly and incorrectly specified frailty models. 
Thus, to ensure that this score test is consistent for the fixed alternative 
Hi ■ 70 > 0, one would need to establish that the profile likelihood for 7, 
profiling over (3 and A, is convex over the region [0, mi] when the model is 
correctly specified. This appears to be very challenging analytically. 

7. Computational issues. We implemented the profile likelihood estima- 
tion method of Section 3.1, along with the bootstrap procedure of Section 4.4 
based on Dirichlet weights, for the gamma and inverse Gaussian frailty mod- 
els. We did not implement the log-normal frailty model because of the ad- 
ditional computational burden resulting from A 7 not having a closed form. 
Although this issue can be addressed using Monte Carlo quadrature meth- 
ods, we do not pursue it further here. To estimate the parameters in the 
gamma and inverse Gaussian frailty models, we maximized the nonpara- 
metric likelihood via profiling. A simple random search method based on 
the Metropolis-Hastings algorithm was used to maximize pL n (9) over 0. 



FRAILTY REGRESSION MODELS 



23 



For each candidate value of 6, the fixed-point equation given in (3.2) was 
iterated until stabilization to obtain Ag. Some simplification of Jjf occurs 
for these two frailty models. For the gamma frailty, 

"y(t)e /3 ' z W(l + 7 <5)" 



4{t) 



l + 7fT^(V) 

and for the inverse Gaussian frailty, 

n> M _p \ Y(ty z ^({l + 2 1 HHV)yl* + 1 5) 

Jn[t) ~ n [ l + -yH*{V) 

When the candidate value of 7 was negative, the likelihood was considered 
if Gry(H^' Ae \V)) was either negative or undefined for any data point. Over- 
all, we found that this procedure was accurate and computationally efficient 
at finding the maximum, with or without bootstrap weights. 

8. Example: non-Hodgkin's lymphoma data. The data are a subset of 
1385 patients with aggressive non-Hodgkin's lymphoma (NHL), from 16 in- 
stitutions and cooperative groups in North America and Europe. These pa- 
tients were treated with a particular chemotherapy regimen. Survival was 
documented from start of treatment until either death or loss to follow-up. 
The censoring rate was 54.7%. Information on the following pretreatment 
covariates is complete for all patients in the subset: age at the diagnosis of 
NHL (< 60 or > 60 years), performance status (ambulatory or nonambula- 
tory), serum lactate dehydrogenase level (below normal or above normal), 
number of extranodal disease sites (< 1 or > 1) and Ann Arbor classification 
of tumor stage [stage I or II (localized disease) or stage III or IV (advanced 
disease)]. Each characteristic is coded for the first group in the parenthe- 
ses and 1 for the second. These dichotomous predictors are the basis for 
the original model [Non-Hodgkin's Lymphoma Prognostic Factors Project 
(1993)]. A clinical reason for using dichotomous predictors is that it provides 
a simple classification of risk based on only a finite set of risk groups. 

We now illustrate the utility of the procedures described in Section 7 for 
the gamma and inverse Gaussian frailty models. The bootstrap procedure 
based on the Dirichlet weights was repeated 500 times for inference. Parame- 
ter estimates, standard errors and Z values for the parameters in the gamma 
frailty model both with unknown 7 (GF) and with the estimated value of 7 
treated as known (GFo), the inverse Gaussian frailty model with unknown 7 
(IGF), the proportional odds model and the Cox model are given in Table 1. 
While the coefficient estimates are the same for GF and GFo, the difference 
is that the standard errors for GFo are based on bootstrap estimates from a 
model with fixed 7 = 2.197. The GFo standard errors are helpful in assess- 
ing the bias in precision estimation due to assuming 7 known. This bias is 
generally nontrivial and should not be ignored in practice. 
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Table 1 

Parameter estimates for the non-Hodgkin's lymphoma data using the gamma frailty model 
with 7 both unknown (GF) and fixed at the estimated value (GFo), the inverse Gaussian 
frailty model with 7 unknown (IGF), the proportional odds model (PO) and the Cox 

model (PR) 



Covariate Parameter 


Model 


Estimate 


S.E. 


Z value 


(ouclsratej 7 


Vjr 


9 1 Q7 

z. iy t 




A Q1 A 
4.y 14 




(jr 


9 1 Q7* 

z. iy / 








TCP 1 


/l ^9^ 


X.oOO 


9 ^9 






1 * 
1 








r LI 


n* 
u 






Age pi 


P 1? 
ur 


1 1 nn 

1.1UU 


u. ioy 


( .yuo 




P T?n 


1 1 nn 

1.1UU 




O.DZO 




TPT? 


1 nns 

l.UUo 


U. loo 


7 9Q1 

( .zy 1 




PD 


U.ool 


U.llo 


I .40 / 






U.Ooo 


U.Uoo 


7 7Qfi 

< . / yu 


Level 02 


GF 


1.024 


0.159 


6.433 




GFo 


1.024 


0.144 


7.155 




IGF 


0.933 


0.143 


6.524 




PO 


0.833 


0.120 


6.967 




PH 


0.624 


0.092 


6.796 


Status 03 


GF 


1.291 


0.210 


6.136 




GFo 


1.291 


0.166 


7.779 




IGF 


0.994 


0.147 


6.767 




PO 


0.949 


0.145 


6.562 




PH 


0.586 


0.098 


5.958 


Sites Pa 


GF 


0.694 


0.156 


4.444 




GFo 


0.694 


0.149 


4.644 




IGF 


0.622 


0.139 


4.464 




PO 


0.546 


0.122 


4.458 




PH 


0.394 


0.092 


4.272 


Stage P$ 


GF 


0.584 


0.158 


3.693 




GFo 


0.584 


0.154 


3.779 




IGF 


0.545 


0.148 


3.680 




PO 


0.485 


0.137 


3.549 




PH 


0.369 


0.104 


3.560 


*7 is fixed at the given value. 



The attenuation of the covariate effects in the Cox model, predicted in 
Section 6.3, is evident in the results, although the attenuation does not ap- 
pear to be uniform across all covariates. For status, the ratio of the parameter 
coefficient under the gamma frailty model to the coefficient under the Cox 
model is about 2.2, while the corresponding ratio for stage is only about 1.6. 
This difference may be related to the fact that the ratio of the standard 
errors of the status coefficient estimates for GF to GFq is about 1.27, while 
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the corresponding ratio for stage is only 1.03. An anonymous referee has 
suggested that the Cox attenuation phenomenon for a covariate effect may 
depend on the degree to which that covariate's parameter estimate is corre- 
lated with the frailty variance estimate. Except for the GFo results, the Z 
values for the covariate effects are fairly stable across models. 

The estimated frailty variances in GF and IGF are 2.197 and 4.325, 
respectively, which are significantly higher than that assumed by the Cox 
model (7 = 0) and the proportional odds model (7 = 1). The maximized 
log profile likelihood values for the GF, IGF, PO and PH models are 
-4618.39, -4628.37, -4623.30 and -4688.40, respectively. This suggests 
that GF provides the best fit to the data and that 7 is significantly greater 
than 1 (p = 0.0074 via the two-sided Wald test based on the bootstrap). 
Also, PO is better that IGF, even though IGF is more flexible, seemingly. 

In Figure 1, we plot the Kaplan-Meier estimates of the marginal survival 
distributions for the LDH level and performance status groups. The esti- 
mates from GF and the Cox model are also displayed. The survival estimate 
in a group (e.g., patients with status = 0) from GF is Ilo<s<i{l ~~ A-H(s)}, 
where 



H(t) 



Ei^OOexpi&ZiOO} 



X ( 1+Afn j eX P{P'n Z i( U )} d An(u)^j dA n ( S ), 



and summation is over all observations in the group. That is, H(t) is a 
model-based estimate of the cumulative hazard which averages over the ob- 
served covariate distribution in that group. The estimate based on the Cox 
model uses the partial likelihood estimator, Breslow's estimator and in the 
place of f3 n , A n and 7„, respectively, in H. The reason the Kaplan-Meier 
curves may be quite different from the model-based curves in the tail is that 
there are fewer observations in the subgroups available for the Kaplan-Meier 
curves, whereas the model-based curves utilize all of the data. In general, the 
GF estimates are closer to the Kaplan-Meier curves than the proportional 
hazards fit, particularly with the performance status group comparison. This 
demonstrates the superior fit of the frailty model. We also examined the sur- 
vival curve estimates based on the proportional odds model and found them 
to be intermediate between GF and the Cox model. These are omitted from 
the figures for clarity. 

Next, we illustrate the robust inference procedure for the best fitting 
survival probabilities under the assumed gamma frailty model. Survival pre- 
dictions for two covariate values, representing an elderly high-risk patient 
[Z = (1,1,1,1,1)'] and an elderly low-risk patient [Z = (1,0,0,0,0)'], are 
shown in Figure 2. Also shown are 95% simultaneous confidence bands for 
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Survival time in years 
(a) LDH level 




Survival time in years 
(b) Performance status 



Fig. 1. Estimated marginal survival distributions of non-Hodgkin 's lymphoma patients 
for LDH level (a) and performance status (b) under the gamma frailty model. The Ka- 
plan-Meier estimates and Cox model estimates are included for comparison. 



the GF prediction using 500 multiplier bootstrap samples with Dirichlet 
weights. The Cox proportional hazards survival predictions are included for 
comparison. The Cox prediction for the high-risk patient significantly under- 
estimates the long-term survival probability relative to GF. The difference 
between the Cox and GF predictions is less pronounced for the low-risk 
patient. Some improvement in the model fit may be possible if continuous 
rather than dichotomous covariates are used, but we do not pursue this 
further here. 
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9. Proofs. 

Proof of Proposition 1. For any IGG(a) frailty family, the condi- 
tions hold with e (t) = (2/3)(l Vt)~\ ci(j) = 1/7 and c 2 (j) = (1 - a)/7, 
where a V 6 is the maximum of a and &. Establishing these results is straight- 
forward. For the log- normal, we now show that the conditions hold with 
£o{t) = (lVf)~ 4 /64, 01(7) = 1 and 02(7) = I/7. Complex analysis is involved 
since y^y is imaginary for 7 < 0. However, the imaginary components of A 7 , 
G 7 and their derivatives are all 0. Moreover, A 7 (£) and its first two deriva- 



q 




o 
o 



2 4 6 8 

Survival time in years 
(a) High risk 




Fig. 2. Survival predictions and 95% simultaneous confidence bands for the gamma frailty 
model: (a) high risk, Z = (1, 1, 1, 1, 1)'; (b) low risk, Z — (1, 0, 0, 0, 0)' . The Cox model 
predictions are included for comparison. 
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tives in t have the following form, with £ = v^FH an d u = te^ 2 / 2 : 
(9.1) (-l) fc e^ 2 / 2 / e - ucos ^cos(usin^-A;^)0(u)^ 



for = 0,1,2, respectively. If we establish that (9.1), for k = 2, is greater 
than over the correct range, then (C) follows and showing (Dl) is easy. 
Fix < u' < co. If there exists a vo > 2 and £o G [0,7r/(2uo)] such that 

(9.2) v! sin £ ^o + 2£ Uo = vr/4 

and such that the part of the integral over \v\ > vq is completely dominated 
by the part over \v\ < Vq, then (9.1) will be greater than for all u € [0, u'\ 
and all £ S [0,£o]- Assume that v o > 2 and £o satisfies (9.2). Then 

v ° , Q5 / 

e -« cos ^cos(u'smt v - 2£ v)<j>(v) dv > ^pL e ~ u 

-VQ V 2 

and 

e -«'cos&t; cos ( u / sin ^ oU _2^ oU )^( 1 ,)d u < _^ e «'-^/2. 

|u|>«0 V27T 

Thus, the total integral is clearly positive when 

JLe^/a/^e-'r 1 ^ 
v/2i I \/2 J 4 

Note that this is satisfied whenever fo > 2\/l V u'. Choosing any positive 
£o < ?r(l V u')~ 3/2 /24 assures that there exists a vq > 2yl V u' which also 
satisfies (9.2). Setting te$ /2 = u', we have that £ = (1/8)(1 V t)" 3/2 is suffi- 
cient and then £o(t) = (1 Vt) _3 /64 works. However, to satisfy (D2) and (D3) 
we reduce the rate to (1 V i)~ 4 /64. Lemma 3 gives that this rate is sufficient. 
□ 

Lemma 3. Conditions (D2) and (D3) are satisfied by the log-normal 
frailty model with eo(t) = (1 V i)~ 4 /64 ; 01(7) = 1 and 02(7) = I/7. 

Proof. For (D2), let W k = e^" 2 ^ 2 , where Z ~ N(0, 1). Now, 
supuA 7fc (u) = supE[ue~ uWk ] 

u>0 u>0 



suvue- uWk 



<E, 

'. u>0 

sup|n 2 A 7fe (n)| = supE[« 2 Wfce _uWfc 

M>0 lt>0 



< E 



supu W k e 

u>0 
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and it follows since EjW^T 1 ] = e 7fc . For (D3), if t k — ► oo and 7^ — ► 0, then 
7fc consists of one or both of two subsequences, one less than or equal to 
and one greater than or equal to 0. Without loss of generality, assume 7^. — ► 
from above or below but not both. We begin with a sequence approaching 
from below. Let £ = VTlTi £k = VrTfeL u = ^ ^ 2 -> u k = tk e ^ 2 \ an d reparam- 
eterize iinf^g^] G 7 (w) for 7 < as 

. • , . . . L e~ wcos ^ v cos(w sin £v - £v )Mv) dv 

t inf G_ ( 2 (to) = u mf ^ — ^ l/ V T — 

«ie[0,t] 5 »e[o,«] / R e-™ cos ^cos(u;sm£v)0(?;)cfo 

= u inf qt(w). 

For v > Vk = u 2 / 3 , the u 2 term in </>(u) completely dominates Uk since v\juk — >oo. 
Since Ufc&'Ufc < % 1/3 -> 0, 

, x . , /Me^-^^^cos^sin^-^)^)^ 
mf gp,\w) = mf 7- — : — - — . , . . , 

™e[o,u fc ] ZkK we[o,u k ] f R e w ( 1 -™& v )co8(w8in.£ k v)(l>(v)dv 

-> 1. 

Hence, u fc inf we[0 , Ufe] g 5fe ( w ) -> 00 • 

Now assume that 7^ — > from above. Then 

-tLy(t) _ f R te^-t/ 2 exp{-te^ v -y/ 2 }(j)(v) dv 
A 7 (t) " / R exp{-te>/^-T/ 2 }0(u) du 

lRCu(w)(f> 1 (w)dw _ 

where it = ie _7//2 , Cu( w ) = ^e^expj— we™} and <f)y(w) = j~ 1 ^ 2 (j)(j~ 1 ^ 2 w). 
Since e - ™ is a decreasing function, for any Wk — ► 00, 

Jm.e- W ( u {w)(p 1 (w) dw Jig e-^CuM^M rfw 
SrCu(w)<I>7(w) dw ~ jZZ h Cu{w)(j) 1 {w)dw 

(9.3) 

_ Sw k eW Cu(-w)4> y (w)dw 
Now denote = ifce _7fe / 2 and let uifc = log(l + ^kUk + u k )■ Since 

-1 / , -wu\-l ( ^kUk \~ l 

l k w k {wk + u k e k ) =w k [lkWk J > YE J 00 

V 1 + 7^ + u k ' / 

for all w >Wk, the 4>^{w) term dominates the expectation in (9.3) with u = 

1/2 

Wfc and 7 = 7fc. Hence, with this substitution, (9.3) = 1 + 7fcHfc + u k + 0(1) 
and tkG lk (tk) — > 00. The same arguments work when 7^ — > 7 > 0, except 
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that liminffc^oo tfcG 7fe (tfc) > 1/7- Condition (D3) follows since (C) implies 
for 7 > that inf wg r 0) i] G 7 (w) = G 7 (t). □ 

Proof of Proposition 2. Fix the sample size n. If the conclusion of 
this proposition does not hold, there exists a sequence {9 m = (7 m ,/3 m ) G 0} 
so that, for A m = A 9m , either limsup^^ A m (r) = oo or liminf m ^oo A m (r) = 
0. Assume first that limsup m ^ 00 -A m (r) = oo and let {rrik} be a subsequence 
for which lirm^oo A mk (r) = oo and -y mk ->■ 7. Let ^ m = (j m ,/3 m , A m ) and 
= (7 m , /3 m , P n N) . Using arguments from the proof of Theorem 1 , we can 
conclude that 7 > and, for a partition = uq < u\ < ■ ■ ■ < uj < r, that 

III ) 

< c + \og{A mk (r))¥ n [51{V G [uj- U 00]} 
(9.4) _( Cl ( 7 ) + £)i { y G [ Uj)00 ]}] 
J-i 

+ £ io g (I m)i ( % ))nj<5i{y g 

-(ci(7) + <5)l{y€[« i ,« i+ i]}] 1 

where c G (0, 00) is a constant not depending on the parameter values or 
on the partition, and where the summation is when J = 1. Let T±,. . . ,Tj 
be the observed failure times and define a partition with uj =Tj, and, if 
J > 1, let Uj G (Tj,Tj+i) for j = 1, . . . , J — 1. Now the intervals [uj,Uj+±], 
for j = 0, . . . , J, where = 00, all contain exactly one failure time. Thus, 
(9.4) < c - ci(7)[log(i mfe (r)) + E/=i X log(An fc (%■))]■ Hence, using again ar- 
guments from the proof of Theorem 1, (9.4) — ► —00. This is a contradiction. 
Thus, limsup m ^ 00 A m {r) < 00. The proof that inf^gQ Aq(t) > can also be 
obtained from arguments similar to those employed in the proof of Theo- 
rem 1. □ 

Proof of Theorem 1. Let (X°° , £>°° , P °°) be the probability space 
for infinite sequences of observations, let W C X 00 be the set of obser- 
vation sequences for which P n iV converges uniformly to fio = PoN and 
note that P*(W) = 1. Then if the conclusion of Theorem 1 does not hold, 
there exist a sequence {9 n = (j n ,Pn) G &} and an u G W so that, for A n = 

Ag n {uj} (we will suppress dependence on uj hereafter), lim SUp n >OQ A n \TJ — 

00, liminfn^oo A n (r) = or A n is not asymptotically close to an absolutely 
continuous function with bounded derivative. 

Assume first that limsup rwoo A n (r) = 00. Now suppose 7„ has an accu- 
mulation point at 7 < along a subsequence for which A n (r) — ► 00. But 
this is impossible by (C) and the constraints on Am- Thus, j n has no such 
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accumulation points less than 0. Suppose, however, that 7„ has as one 
of these accumulation points. Accordingly, take a subsequence {n&} such 
that 7„ fc — > and A nk (r) — ► oo. Since, by (Dl), G 7 (ti) — G 7 (-u){G , 7 (-u)}~ 1 = 
-A 7 (u){A 7 (n)}^ 1 > 0, we have by (3.3) that 



) 



x dP n {iV(i)} 



<o(i; 



uSlO.ifoA^Cr)] 



since inf tg r 0ir ] P n Y(i)e ^ ' is bounded below for all n large enough. Thus, 
by (D3), 1 < 0(1) {An^mi^x^^G^iu)}- 1 -►(), which is a con- 
tradiction. Hence, for any subsequence with A nk (r) — > oo, the accumula- 
tion points of j nk are greater than 0. Now let {n k } be a subsequence with 
^" fc (t) -> oo and 7„ fc -> 7 > 0. 

Let ^ n = (7„,/3 n ,i n ) and ^ n = (7„, /?„, P„A0- Then 



< L nk $ nk ) - L nk (ip nk ) 



< 



Hi: 



log 



G Jn (H^(s)) 



+ log(n k AA nk (s)) 



dN(s) 



+ log 



A 7 „ fe (^(^)) 



K lnk {H^ k {V)) 



<0(1) + 



in- 



log(n k AA nk (s))dN(s) 

+ [-5(1 + d(7)) logfl^* (y)] A 
-(l-S)G Jn (H^(V)) 



since 



log{G 7nfc (H*»* (a))} dN(s) + log A 7 „ fc (H^ (V)) 



0(1) 



and since (D2) implies 



log G lnh (s)) <[-(! + Cl (7)) log H^k ( s )] a + 0(1). 
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Now C nk is bounded above by 

O(l) + F„ fc | J T log(n fc Ai nfc (*)) dN(s) 

(9.5) 

+ [-(S + c 1 ( 1 ))logA nk (V))A0 

since (D2) also implies G 7rifc (H$ n k (V)) > [c\ (7) log A nk (V)] V + 0(1). For a 
sequence = uq < u\ < 112 < • • • < uj = r, let N^(s) = N(s)l{V G [uj-i,Uj]}, 
j = 1, . . . , J. By Jensen's inequality, 

log(n k AA nk (s))dF nk {Ni(s)} 

< P n . k N j (r ) log n AA n . k (s) dP nk W (s)/F n . k N j (r ) 

< 0(1) +log(A nk ( Uj ))P nk (51{V G [uj^Uj]}). 



Thus, (9.5) is dominated by 

O(l) +log(A nk (r))¥ nk [61{V G [uj-1,00]} - ( Cl ( 7 ) + S)1{V G [r,oo]}] 
j-i 

(9. 6 ) + £ log(i nfc (^))P nfc [5l{y G [«,■_!, «,•]} 

i=i 

-( Cl ( 7 )+<S)l{FeK,u j+1 ]}]. 

Choose e:0 < e < Pq{V = r} and the sequence {uj} for finite J such 
that ci(7)e/(ci(7) + 1) < n (u j+1 ) - fi (uj) < ci(j)e for j = 0, . . . , J - 2 and 
/x (r)-/x (uj_i) = ci(7)e/(ci(7) + l). Note that (3.3) implies log(i nfc (u)) > 
log(P nfc iV(n)) + O(l), since G 7nfc (t) - <JG 7 „ fc (t)/6 7 „ fc (i) < 2 - G 7 „ fc (0+) for 
all t > and all fe large enough. Since also F nk N — ► /io uniformly, (9.6) goes 
to — 00. This is a contradiction. Thus, limsupA„ fc (r) < 00. 

Now assume that there are a sequence {(jmfln) G 0} and an u> G W 
so that liminfn^oo A n (r) = 0. Define A n = e$ 1 (co)¥ ti N/Kq and note that 
A n G »4( 7n ) for all n > 1. Now let {n*.} be a subsequence with A nfe ("r) — ► 
and define Vn = (in, fin, A n ). Then 

0< ln k $n k ) -L nk (l/> nk ) 

< O(l) +P nfc |^ T log(Ai nfe ( S ))diV( s ) 

< O(l) + F nk { jT log(i nfc (t)) diV(s)| - -00. 
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This is again a contradiction. Hence, liminf A n {r) > 0. By previous argu- 
ments and (3.3), we also have, for s,t £ [0,r], that 

\A n (s) - A n {t)\ < 0(l)F n \N(s) - N(t)\ < O(l)Ms) - a*o(*)I + °(1)> 
and the conclusions of the theorem hold by (A2). □ 

Proof of Lemma 1. Fix a convergent sequence {9 m } G 0. The argu- 
ments used in the proof of Theorem 1, after replacing the measure IP^, with 
.Po j can be used with only minor modification to show that lim sup m ^ oc Ag m (r) < 
oo and lim inL^i—^ Ag m (r) > 0. Using again arguments from the proof of 
Theorem 1, we can also establish that \Ag m (s) — Ag m (t)\ < c\fio(s) — [J>o(t)\ 
for all s,t G [0,r] and a constant c G (0, oo) not depending on the sequence. 
The desired results now follow from condition (A2). □ 

Proof of Theorem 2. Let W C X°° be the set of data sequences for 
which ¥ n N — > uniformly and note that the class of functions 

g k = [ Y (t)<P z ® f G 7 {^(F)} - : 

I w v 7 G 7 {ip/'(v)}/ 

i G [0,r],^G * and A(r) < k 

is Po-Glivenko-Cantelli for each k < oo. To see this, arguments given in 
the proof of Proposition 8 verify that the classes {Y{t)e^' z ^ :t G [0, r],/?G 
£ } and {H^(V):ij> G *,A(r) < fc} are Donsker; conditions (C) and (Dl) 
imply that the maps (7, i) i— > G 1 {t), (7, i) i— > G 7 (i) and (7,i) i— > [G 7 (i)] _1 
are bounded and Lipschitz over the domain [— £o(u),m] x [0, u] for any 
ii G (0, oo); and A(r) < k implies that H^(V) < kK for all V G * almost 
surely. Thus, the classes {G 7 {fl^(V)} : V G V,A(t) < k}, {G 7 {H^(V)}:ip G 
*,^(r) < fc} and {[G 7 {fi^(V)}] -1 :^ G tf,A(r) < fe} are all Donsker by 
Theorem 2.10.6 of VW. Since products of bounded Donsker classes are 
Donsker, the class is Donsker and hence also Glivenko-Cantelli. 

For each M < oo, let Wm C W be the subset of data sequences for which 
the limit points of {Ag : 9 E G} are in /Cm and also for which P n — ► Po in 
£°°(Gm). For any 9 G 9, let Ag{t) = J* [jg 6 (s) + po(^)] _1 Pn{^W}, where 
J? = P Jt and 

M^) = • 

Also let iJjq = (9,Ag) and tpg = (9,Ag). We will first show that 

(9.7) sup|L n (^)-Z n (^)|->0 

eee 
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outer almost surely and then show that 

(9.8) SUpjLnO/^J - Ln{^6 2 ) ~ > ^2) I ~* 

6»i,6» 2 eG 

outer almost surely, and the proof will be complete. 

Fix M < 00, choose a w £ Wm and let {n} index the corresponding data 
sequence. Let {9 n } be any parallel sequence of parameters in G and let {n^} 
be any convergent subsequence with 9 nk — > 9*, tpg n tp* = (9*, A*) 

and ^e„ k ~ ¥ ^** = (9*,Ag*). The last convergence statement follows from 
the definitions of Ag and Ag. Since 

dA nk (t) 4 6nk (t) + Po^e nk ) 

(9.9) 

~" jf(t)+ P0 (r) ' 

uniformly over t £ [0, r], where the limit of (9.9) = dA* (t) jdAg* (t) is bounded 
below and in total variation, we have that 

F n £\og{dAe nk (t)/dA enh (t)}dN(t) ^ £ log{dA*(t)/dA e *(t)}dti (t). 

Hence, it follows that 

< L nk ) - L nk ) - P log ^ < 0. 

Since this is true for every such convergent subsequence, and since M < oo 
can be increased so that P*(Wm) is arbitrarily close to 1, we have estab- 
lished (9.7). 

Since by Lemma 1 



dA 9l (t) _ jp(t) + p (^ 



02 J 



is bounded below and in total variation, uniformly over #i,#2 £ 6, we have 
that 



P n / log{dA dl (t)/dAe 2 (t)}dN(t)^ \og{dA dl {t)/dAe 2 {t)}d^{t), 
Jo Jo 

uniformly over 6>i,6>2 £ 6. Hence, it follows that (9.8) holds. □ 

Proof of Lemma 2. By the smoothness assumed in (Dl) of the in- 
volved derivatives, 



lim sup sup 



ip(cr^ 1+s t^(h) - a^{h)) ds 



0. 
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Thus, su VheHr - +^Mh))\ =o(U\\ {r) ) 

as |M| (r) -0. □ 

Proof of Theorem 3. Recall that A n = A$ and ip n = (9 n , A n ), where 

9 n is the profile MLE. Theorem 1 implies that the set W C X°° of data 
sequences for which the limit points of A n are in Km, for some M < oo, has 
inner probability 1. Accordingly, fix the data sequence w G W and take a 
subsequence {n^} for which ip nk converges uniformly to some ip = (9, A) G 
with A G /Cm for some M < oo. Let ip n = [9^,AqJ), where 9* = (7*, and 
^Ig is as defined in the proof of Theorem 2. By Theorem 2 we have 

< L nk (4> nk ) - L nk ) -> P log ^ < 0, 

and hence condition (F) implies that ip = ip* ■ Since this is true for every 
convergent subsequence, ip n — > ip* with inner probability 1. S ince A n is a 
piecewise constant function with mass Ayl n only at observed failure times 
t\ , . . . , t mn , ip n is a functional of a maximum taken over m n + d + 1 real 
variables. This structure implies that sup tg [ ,r] \A n (t) — A*(t)\ is a measur- 
able random variable, and hence the uniform distance between ip n and V* 
is also measurable. Thus, the convergence with inner probability 1 can be 
strengthened to outer almost sure convergence. □ 

PROOF of Proposition 3. Since / does not depend on covariates, 
there exists an Aq £ -4o so that So(t) = exp(— Ao(t)) fo r all t G [0, r]. Hence, 
the parameter value ipo = (70 = 0, (3q = 0, Aq) for the posited model describes 
the true distribution of the failure times. Arguments in Theorem 3 now 
yield that, with inner probability 1, all limit points of the maximum likeli- 
hood estimator ip n lie in a compact set ^ for which Pq logip^/po) = 0. Since 

P log(fty / 'Po) ^ _ J#(?\/ 2 ~Po ) 2 Podv and p =Pf , we now know that all 
Kullback-Leibler maximizers ip = (7,/?, A) G ^ must satisfy G^(H^(t)) = 
Gq(H^° (t)) for all t G [0,r]. This implies that (3 = /3q by arguments given in 
Proposition 4. The desired result now follows. □ 

PROOF of Theorem 4. From Section 3.3, we have cr^ (/13) = /J" f\(s) x 
/*3(s)dA,00, ^(^3) =/ V 2 ( s )h 3 ( s )^( S ) and a™(h 3 ) *=9i(-)%Ms) X 
hs(s) dA 3f (s) + g2(-)h 3 (-) , where fi, /3, <?i :R i-> R and / 2 :R i-> R d are bounded 
and where <?2(s) = Po{£^^(s)e^* Z ^}. From the proof of Theorem 1,0 < Q2{s) < 00 
for all s G [0, r]. Thus, oy,, = a$ + cr^, where 

/l \ //tA 

7 /12 

V0 52(0/ V/13/ 
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is continuously invertible and onto, = — is compact and / de- 
notes the identity. Since a^ t is one-to-one by condition (G), is con- 
tinuously invertible and onto, with inverse cr7 . This now implies that, for 
each r > 0, there is an s > with a~ 1 (H s ) C H r . Fix r > 0. Continuous in- 
vertibility of —U$„ on lin\E', where lin denotes linear span, now follows by 
Proposition A. 1.7 of BKRW since 

t II^.WOII(r) ^ f SU Ph^(H q )\^M)\ 
int ^— n — > ini 



iPelin<S> \\tp\\( r ) V6lin* \W\(r) 



in f <±lM > q 



V>e* HV'll(r) 3r 

This proposition also implies that — CX/,„ is onto. By Proposition 8, 

^(U^ n )(h) - U^ n )(h)) - MK{^*){h) - USMih)) 

(9.10) 

= o P (l + y/n\\ip n - ip*\\) } 

uniformly over h E H r , where || • || is the uniform metric. 

Applying the Z-estimator master theorem (Theorem 3.3.1 of VW) now 
gives the desired weak convergence of \/n(^ n — ip*), provided Uq (V ; *)(") = 0) 
Un(i/j n )(-) = asymptotically and C/ T (^*)(-) is Po-Dc-nsker. The first two 
conditions follow from condition (F) and the fact that ip n is asymptot- 
ically an interior point in ^ by consistency and condition (C). Because 
products of bounded Donsker classes are Donsker, showing 5h^(V) and 
Jq Y(s)e /3 * z ^h^(s) dA*(s) are Donsker, as processes indexed by /13 : (hi, hi, /13) 
H r , is sufficient, since f3* and A* are fixed. First, since all functions in H r 
are bounded in total variation, ^3 (V) is Donsker, as a class indexed by ^13, 
since it is the product of bounded Donsker classes. Next, {P'^Z(t) :t 6 [0,r]} 
is Donsker since the total variation of Z is less than or equal to tuq with 
probability 1 by condition (A3). Since exp(-) is Lipschitz on compacts and 
{Y(t),t£ [0,t]} is monotone and bounded, {Y(t)e@* z W} is Donsker. Fi- 
nally, since the map from Y(-)e^* z<y '^ to Jq Y(s)e^* z ^h^(s) dA*(s), as a 
map from an element in £°°([0,t]) to £°°(H r ), is continuous and linear, the 
continuous mapping theorem yields the desired Donsker property. These re- 
sults now imply that ip n (h) is asymptotically linear with influence function 
1(h) = U T (ip*)(a^(h)) and covariance V m (g,h) = E[!(g)£(h)] for g,he H r . 
Taking r > 1 yields weak convergence in the uniform metric, since H\ is 
sufficiently rich as noted earlier. □ 



Proposition 8. Expression (9.10) holds. 
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PROOF. If for some e > {U T {ip)(h) - U T {^){h) : ||V> - < e, h G H r } 
is Po-Donsker and linty_^. sup heHr P {[/ T (?/>)(/i) - U T (ip*)(h)} 2 = 0, then (9.10) 
holds by Lemma 3.3.5 of VW. The latter condition follows from condi- 
tion (Dl). The Donsker condition requires more work. Let Vl/ e = {ip: — 
ip* II < e }- Take e small enough so that C Vl/: such an e always exists by (C) 
and the fact that 7* > 0. Because Z has bounded total variation and the class 
{P,/3 G Bq} is trivially a bounded Donsker class, {/3'Z(t),(3 G Bo,t G [0,r]} 
is Donsker. Since exp(-) is Lipschitz on compacts and {Y(t),t£ [0,r]} is 
monotone and bounded, the class {Y(t)e^ z<yt \j3 G Bo,t G [0, r]} is Donsker. 
Because A £ = {A : (7, (3, A) G ^> e } is uniformly bounded in total variation, 
the map Y(t)e@ z ( t ' 1— ► /J" Y(s)e^ z ( s ) (L4(s), as a map from an element in 
1°°(Bq x [0,t]) to an element in £°°(Bq x A £ ), is continuous and linear, and 
the continuous mapping theorem yields that /J" Y (s)e^ z ( s ) cL4(s) is Donsker 

as a process in £°°(^ E ). By conditions (C) and (Dl), G 7 (t), G 7 (t), G^t), 

G 7 (f) and [G 7 (i)] -1 are Lipschitz in 7 and f over the appropriate range. 
Thus, 



G 7 (^(V)) 



and 



< r 7v " /; - G 7 (F^(y)) 
G 7 (F^(y)) 



are also Donsker as processes in £°°{^ e ). Similar results and the fact that 
both sums of Donsker classes and products of bounded Donsker classes are 
Donsker give the result. □ 



Proof of Corollary 1. We first prove (ii). Using arguments from 
the proof of Theorem 4 and applying the Z-estimator master theorem (The- 
orem 3.3.1 ofVW) gives that v^(C = V^n^WOO^O) + o P (l) 
unconditionally, where op denotes a quantity approaching in outer prob- 
ability. Since V™(C - $n) = V™(K ~ ^n)U T (ifj*)(a- l (-)) + o P (l) uncondi- 
tionally, (ii) follows by the multiplier central limit theorem (Theorem 2.9.6 
of VW) since U T (ip*)(cr^(-)) is Po-Donsker and, over this Donsker class, 

n 

y^(P° " Pn) = C^Ete " Cn)(A Xl - P ) 

i=l 

n 

= v^E(C* - l)(Ax 4 - Pq) + op(l), 

i=l 

where C, n = n~ l J2?=i (i an d is the point mass at X. Similar arguments 
establish (i), but the nonparametric bootstrap central limit theorem (Theo- 
rem 3.6.1 of VW) is used in place of Theorem 2.9.6. □ 
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Proof of Proposition 4. Since Polog(p^/p ) < - fxip]/ 2 ~p)J 2 ) 2 x 
Podu and po =p^p , we are done if we can show that, for any ip G fy, 

(9.11) G y (H*(t)) = G no (H*><t)) 

for all t G [0, r] implies ip = ipo almost surely. Note that this requirement is 
valid even for 7 < 0, since by (C) and (Dl) an appropriate extension of G 7 
and its corresponding density p^ exist. Taking the derivative of both sides 
of (9.11) with respect to t yields 

(9.12) G 7 (H^(t))e^ z (%(t) = G 70 (i^°(i))e^(*), 

where 6 = a/a . Letting t J. in (9.12) gives 6(0+)e /3 ' z(0+) = e /3 o z (°+) by 
condition (Dl). This implies (5 = (5q since var[Z(0+)] is positive definite. 
Hence, 6(0+) = 1 by condition (H). Setting f3 = (3q and dividing both sides 
of (9.12) by e^o^W, differentiating with respect to t and letting t J gives 
G 7 (0+)e^ z (°+) + 6(0+) = G 7o (0+)e^o^(o+), where 6 = d&/tL4 . Since var[/%Z(0+)] > 
0, 6(0+) = 0. This now proves 7 = 70 since G 7 (0+) is monotone and bounded 
in 7. Now A = Aq follows trivially. □ 

Proof of Theorem 5. With any h G H r such that a^ (h) = 0, define 
the regular parametric one-dimensional submodel ipot(h) = ipo + £{/ti,/i2, 
So ^3( s ) dA (s)}. Note that a^ Q (K) = implies 



f d 2 
P °i(5t)2 Ln ^ *' 



t=0 



P {U T (^ )(h)} 2 = 0, 



where the score operator U T is given by (3.4). But this implies Po{U T (ipo)(h)\Q(n, 
y,t)} 2 = 0, where the random set G(n,y,t) = {N,Y : N(s) = n(s),Y(s) = 
y(s),s G [t, t]}, has nonzero probability. This then implies that f7 i (V'o)(^) = 
almost surely for all t G [0,r] (here is where we need the dependence on r 
mentioned above in Section 3.3). Assuming that [{N(s),Y(s),Z(s)},s > 0] 
is censored at V G (0, r] , 

= G§(H^(t))h 1 

(9.13) 

+ G 7o (i^°(i)) / rW^W^ZW + fcaW}^^)- 







Taking the derivative with respect to t and letting t J. yields h' 2 Z(0+) + 
/i 3 (0+) = since G$(0+) = and G 7o (0+) = 1 by assumption (Dl). But 
this implies h 2 = by condition (A3). Dividing (9.13) by G 7o (H^°(t)), dif- 
ferentiating with respect to t and taking h 2 = yields 

= [G«(^(t))G 70 (^(t)) - G^(H^{t))G^(H^{t))]hi 

+ [G yo (H^(t))] 2 h 3 (t). 
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Differentiating again with respect to t and letting t J, gives = G-^ (0+) x 
e* z (°+)/ti + /i 3 (0+), where /i 3 = a^dh^/dt. Now (H) and (Dl) yield hi = 0. 
Thus, (9.13) implies h%(t) = for all t G [0, r], and the desired result follows. 

□ 

Proof of Theorem 6. Define 

*M = = (7,/M) =7 e [-£o(^o^W),mi], 

/3G5 ,AG A m ,1/M <a< M}. 

For ft, = (hi,h 2 ,h3), with /ii el, h 2 G M d and /i 3 G L 2 ([0,r]), also define 
the metric ||/i||{2} — |^a| + \J h%h~2 + (/J" ds) 1//2 and let the space of all 
such h with ||/i||{2} < 00 be denoted -H{2}- Let P-^g = J x gPipdv and denote 
oy, = P^o^p. Arguments in the proofs of Theorems 4 and 5 can be readily 
reworked to yield that oy, is one-to-one and continuously invertible as an 
operator in H{ 2 }- Thus, by the uniform compactness of ^m, by continuity 
and by Proposition 4 and Theorem 5, there exist constants bi,b 2 > and 
ki, k 2 , &3 < oo such that, for all ip G ^m, 

i> h {a^{h))>b x \\h\\\ 2} and ^ h (a^(h)) < h \\h\\ 2 {2} 

^-almost surely V h G H{ 2 } , 
\\^(h)\\ v > b 2 \\h\\ v and ||<X0(/l)||„ < fc 2 ||fr||„, 

za- almost surely V/i G H^, 

and 

1/^3 < < ks, ^-almost surely. 

Choose £i = (2/3)({6i/fci} A {b 2 /k 2 }) and, for any / G D(v) [where D(v) is 
defined in Section 6.1] and tp G denote = J x &ipP(f) dv. Then, for 
all pairs (/, ip) with / G D(v), ip G *m and J x \f — fy\ dv < ei, we have 
iP h {af(h)) > (2/3)6 2 |N|f 2} for all h G H {2} and \\a { J\h)\\ v > (2/3)b 2 \\h\\ v 
for all h G . 

For the chosen Q, let e = {el/[20k 3 Q(Q + k 3 )}} A{ei/3}. Then, for all pairs 
(/, ip) with / G D(v) , tp G \Pm and J x \ f — fy\ dv < e, we have a unique max- 
imizer 

■01 = argmax / log(p^„/p(/))p(/) dv 

( f) 

such that || crV (/i)lk> > (2/3)Z>2 1|^||«> f° r all h G -ffoo- To see this, note that, for 
(f,ip) and all ip* with J x \p^ —p^\ dv < (2/3)e%, we have J x \p(f) — p^„ \ dv < 

Ix\f ~ U\d» + fx \P4> ~ V4,Mv < £l . Thus, ${p$(h)) > ^{5^{h)) ~ 
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^ilHI{2} fx \P(f) ~ Pi>* \ d v — (1/3)^1 1|^||{2} f° r au h G H^ 2 } and, arguing in a 

similar manner, ||cr^(/t)||„ > (l/3)6 2 ||/i||t, for all h G H^,. Thus, Jx^°siPi>*/P(f))P(f) dv 
is a convex function in tp* with a continuously invertible second derivative, 
provided J x |p^ - p^\dv < (2/3)ei. 

Furthermore, whenever -0* satisfies J x \p^, — | < e/10, we have fx^°E(Pil>*/P(f))P(f) du> 
—ksQ J x \pip t —P(f)\ du > — /c3<5(H/10)e, but whenever ip* satisfies |p^, — 
Pipjdu > (2/3)ei, wehave/^log(p^/p (/ ))xp (/ ) dz/ < -[2(Q + A;3)] -1 (Xv 
p</Jc^) 2 < -£i/[18(<3 + fc3)] < -/c 3 (5(ll/10) e - Hence, any Kullback-Leibler 
maximizer ?/>(/) must satisfy J x \p^ — Pip, f A dv < (2/3)e\. The desired exis- 
tence and uniqueness of ^(j) now follow, and P^a^^ f) is one-to-one since 

\\P(f)&ilirf\ (h)\\v > fo r au h S Hoo and some b > by arguments given 

above. □ 



PROOF of Theorem 7. Assume without loss of generality that E[Zi] = 
0. Note that A* is uniformly bounded and equicontinuous by Lemma 1. De- 
Rne K^(e,t) = G^(t)-eG^(t)/G y (t), K y (e,t) = dK^(e,t)/(dt) andU(x;ip) = 
z\{e — H^(v)Ky(e, H^(v))}, where x = (v, e,z) £ X, z = (zi,z 2 ) and z\ is a 
possible value of Z\. For each tp = (j,f3 = (Pi, (32), A) G define = 
( 7 ,(0,(3 2 ),A). 

Assume first that Fq 1 ^) is constant in z\. It is clear that E[U (X ; ipl°^ )] = 
0. If /3*i > 0, then conditions (Al'), (A3') and (El) imply 

E[Z 1 H^{V)K 1 ^5,H^{V))]>E[Z l H^\v)k 1 {8,H^ ) {V))]=Q, 

and thus ~Ej\U(X;ip*)] < 0, but this is a contradiction. Similar arguments 
show that if f3*i < 0, ~E[U(X; ip*)] > 0, also yielding a contradiction. Hence, 
= and (i) follows. 
Now assume Fq 1 is strictly increasing in z\. If < 0, then by condi- 
tion (El) 



but 



E[Zii?^ (y)A' 7 , (<5, (V))] < E[ZiiT^ 0) (F)K 7 , (5, tf^ (V))], 



nz 1 {8-H** \v)k^(5,Hi'? ) (y))}] 



(9.14) = E[-ZiiJ^* 0) (F)G 7 . (H^ 0) (V))} 



+ E 



G 7 ^(H^* (V)) 



Now both J and —V are stochastically increasing in Z\. By condition (El), 
—tG lt (t) is strictly decreasing in t and 1 + tG^ t (t) / G^ t (t) is nonincreasing 
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in t; thus (9.14) is positive. Hence E[U(X; ip*)] > 0. This implies that /3*i > 
and (ii) follows. A similar proof can be used to establish (iii). □ 

Proof of Proposition 5. If 70 = 0, the proof follows from standard 
results for the Cox model. Hence, assume 70 > 0. Without loss of generality 
also assume E[Z] = 0. The score for (3 is 

(9 - 15) E U r " E[Y(t)e^] 

Note that the derivative of this with respect to (3 is negative definite; thus, a 
of (9.15) would be the unique maximizer of the profile likelihood (profiling 
over A). Note that 

E\J3' ZY{t)] E[(3> ZY(t)eP'o z G° (H^(t))] E[P' ZY(t)e^ z ] 



Y(t)e^ z G o l0 (H^(t))dA (t) 



E[Y(t)] 



E[Y{t)e^ z ] 



E[Y{t)e^ z G° Q {H^{t))} 

since G° (H^°(t)) is decreasing in (3' Z but e& z 'G° {H^° (t)) is increasing m 
f3' Z by condition (El). Thus, 

Pih- E lZ { ?tl ] \Y(ty» z G° (fl* ( t) ) d A (t) 



E 



E[Y{t)e^ z ] 
«i/3o, where a\ G (0, 1). 







for /?* 

To evaluate the score at (3 = (3* in directions orthogonal to (3q , let (3\ = [I — 
RPoiP'oRpo)- 1 ^, where R = var[Z] and u G R d . Then E[f3[Z Y(t)e& z x 
GP,(f^°(t))tL4o(t)|/%Z] = since E[/3[Z\P' Z} = by condition (A3"). Sim- 
ilarly, E\fi\ZY{t)e^'* z \ = 0. Since this is true for any u£R d , (3* = ctiA) is 
indeed the unique maximizer of the profile likelihood. □ 

Proof of Proposition 6. When 70 = 0, the result follows since the 
Cox model is a valid submodel for any of the proportional hazards frailty 
models, and consistency has been established in Proposition 4. Assume 70 > 
and let and be as defined above in the proof of Theorem 7. The 
expected score for /?, profiling over A, now has the form 



E 



(9.16) 



Z 



E[Zy(t)e^ z if 7 (£,ff^(VQ)] | 
" E[Y(t)ef 3 ' z K 1 (5,Hi>{V))} J 

xY{t)e^ z G° 10 (H^(t))dA a {t) 



where A(t) solves 



A(t) 



E[Y( s y» z G° {H^{s))} 



E[Y{s)eP' z K 1 {5,H^{V))} 



dA (s) 
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and where A(t) < oo by Lemma 1. Let (3( a , c ) = a fto + c /5i, where (3\ = 
[I- RP^RPo)- 1 ^, with R = var[Z] and u £ R d , as in Proposition 5. De- 
note ift( a ,c) = (7)/3( a ,c))^)- After multiplying by the expected score (9.16) 
becomes 

E[/3^y(t)e /3 («-) z /? 7 (5, irW) (y))] 



E[y(t)e^- c ) if 7 (5,i/^. c )(y))] 
(9.17) 

xy(t)^G^(fl^°(t))cL4i(i). 

We now evaluate g(a,c) = E|/3iZy(*)e^«.<0 Z i: 7 ($,i^<«».<0 (V))]. Note that 
<?(a,0) =0 by previous arguments, and 

MpA = E[(p[Z) 2 Y(t)A^ Z '{KJS, H^) (V)) 
oc 

+ i^C«,c) (F)i^ 7 (5, flV(a,c) (y))}] > 0, 

since K^{t) + tK^(t) > for all < i < oo by condition (El). This now 
implies (9.17) < 0, which means that the expected score (9.16) is positive in 
the direction —f3\. 

Hence, (3* = af3o for some a € R. Note that if a < 0, then 

E[(3' ZY(t)e af3 '° z K^ (5, H^°) (V))} < E[/3' ZY(t)K^ (5, H^°) (V))] 

by condition (El). However, E[f3' Z{5 - A(V)K^(d,H^m(V))}) > by 
arguments used in the proof of Theorem 7. Thus, (9.16) is strictly positive 
if /?* = a[3o and a < 0. Hence a > 0. □ 

Proof of Proposition 7. By condition (E2), 

hm7 G-v(t) = Inn ; — tttt ; — ~ — 

7lo ' ,y ' var[w]i0 E[e~ wt vai[W\] 



nm E[[W-l](l-[^-l]t)] 
var[VK]j,o var[VT] 



and, arguing similarly, 



r -x r m r E[-log(e-^Ve- f )] 

hm7 G 7 (i) = hm — , 

7lo ' var[iy]j.o varfW] 



-log(l + var[H/]t 2 / 2 ) ~t 2 

lim 



-E 



var[w]j,o var[iy] 
r o : 70 = 

e 2a ^'o z Al(V) 



By Proposition 5, the score test for Hq 170 = thus has limiting expecta- 
tion 
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where 




t^[Y{s)e^ z G° Q {H^{s))} 
E[Y(s)e a ^'o z ] 



dAo(s). 



Since A^(V)/2 = J T Y \s) A*(s) dA*(s) , the score expectation becomes 



This clearly is equal to when 70 = 0. Validity under contiguous alternatives 
follows from the regularity of the estimators under the correct model as 
established in Theorem 5. □ 
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